Geography

Datashader is a general-purpose tool for rasterizing (and re-rasterizing) data of many different types. To make it easier to apply this general functionality to the particular domain of geoscience, Datashader provides a few geospatial-specific utilities as well:

This notebook explains each of these topics in turn. See also GeoViews, which is designed to work with Datashader to provide a large range of additional geospatial functionality.

Project points

You can use GeoViews or the underlying pyproj/proj.4 libraries to perform arbitrary projections to and from a large number of different coordinate reference systems. However, for the common case of wanting to view data with latitude and longitude coordinates on top of a Web Mercator tile source such as Google Maps or OpenStreetMap, Datashader provides a simple self-contained utility lnglat_to_meters(longitude, latitude) to project your data once, before visualization. For instance, if you have a dataframe with some latitude and longitude points stretching from San Diego, California to Bangor, Maine:

In [1]:
import numpy as np, pandas as pd
from datashader.utils import lnglat_to_meters

San_Diego = 32.715, -117.1625
Bangor = 44.8, -68.8
n = 20

df = pd.DataFrame(dict(longitude = np.linspace(San_Diego[1], Bangor[1], n),
                       latitude  = np.linspace(San_Diego[0], Bangor[0], n)))

Then you can create new columns (or overwrite old ones) with the projected points in meters from the origin (Web Mercator coordinates):

In [2]:
df.loc[:, 'x'], df.loc[:, 'y'] = lnglat_to_meters(df.longitude,df.latitude)
df.tail()
Out[2]:
longitude latitude x y
15 -78.981579 42.255789 -8.792189e+06 5.199373e+06
16 -76.436184 42.891842 -8.508837e+06 5.295524e+06
17 -73.890789 43.527895 -8.225485e+06 5.392671e+06
18 -71.345395 44.163947 -7.942133e+06 5.490849e+06
19 -68.800000 44.800000 -7.658781e+06 5.590090e+06

The new x and y coordinates aren't very useful for humans to read, but they can now be overlaid directly onto web map sources, which are labeled with latitude and longitude appropriately by Bokeh (via HoloViews) but are actually in Web Mercator coordinates internally:

In [3]:
import holoviews as hv
from holoviews.operation.datashader import datashade, spread
from holoviews.element import tiles
hv.extension('bokeh')

pts = spread(datashade(hv.Points(df, ['x', 'y']), cmap="white", width=300, height=100), px=3)

tiles.EsriImagery() * pts
Out[3]:

If you are using GeoViews, you can get the same effect by calling gv.operation.project. With GeoViews, you can also declare your object to be in lon,lat coordinates natively (from cartopy import crs ; gv.Points(df, ['longitude', 'latitude'], crs=crs.PlateCarree())) and let GeoViews then reproject the points as needed, but dynamic reprojection will be much slower for interactive use than projecting them in bulk ahead of time.

Generate Terrain Data

The rest of the geo-related functions focus on raster data (or rasterized data, after a previous Datashader step that returns an Xarray object). To demonstrate using these raster-based functions, let's generate some fake terrain as an elevation raster:

In [4]:
import numpy as np, datashader as ds, datashader.geo as dsgeo
from datashader.transfer_functions import shade, stack
from datashader.colors import Elevation

W = 800
H = 600

cvs = ds.Canvas(plot_width=W, plot_height=H, x_range=(-20e6, 20e6), y_range=(-20e6, 20e6))
terrain = dsgeo.generate_terrain(cvs)

shade(terrain, cmap=['black', 'white'], how='linear')
Out[4]:

The grayscale value above shows the elevation linearly in intensity (with the large black areas indicating low elevation), but it will look more like a landscape if we map the lowest values to colors representing water, and the highest to colors representing mountaintops:

In [5]:
shade(terrain, cmap=Elevation, how='linear')
Out[5]:

Hillshade

Hillshade is a technique used to visualize terrain as shaded relief, illuminating it with a hypothetical light source. The illumination value for each cell is determined by its orientation to the light source, which is based on slope and aspect.

In [6]:
illuminated = dsgeo.hillshade(terrain)

shade(illuminated, cmap=['gray', 'white'], alpha=255, how='linear')
Out[6]:

You can combine hillshading with elevation colormapping to convey differences in terrain with elevation:

In [7]:
stack(shade(illuminated, cmap=['gray', 'white'], alpha=255, how='linear'),
      shade(terrain,     cmap=Elevation,         alpha=128, how='linear'))
Out[7]:

Slope

Slope is the inclination of a surface. In geography, slope is amount of change in elevation of a terrain regarding its surroundings.

Datashader's slope function returns slope in degrees. Below we highlight areas at risk for avalanche by looking at slopes around 38 degrees.

In [8]:
risky = dsgeo.slope(terrain)
risky.data = np.where(np.logical_and(risky.data > 25, risky.data < 50), 1, np.nan)

stack(shade(terrain,     cmap=['black', 'white'], how='linear'),
      shade(illuminated, cmap=['black', 'white'], how='linear', alpha=128),
      shade(risky,       cmap='red',              how='linear', alpha=200))
Out[8]:

Aspect

Aspect) is the orientation of slope, measured clockwise in degrees from 0 to 360, where 0 is north-facing, 90 is east-facing, 180 is south-facing, and 270 is west-facing.

Below, we look to find slopes that face close to North.

In [9]:
north_faces = dsgeo.aspect(terrain)
north_faces.data = np.where(np.logical_or(north_faces.data > 350 ,
                                          north_faces.data < 10), 1, np.nan)

stack(shade(terrain,     cmap=['black', 'white'], how='linear'),
      shade(illuminated, cmap=['black', 'white'], how='linear', alpha=128),
      shade(north_faces, cmap=['aqua'],           how='linear', alpha=100))
Out[9]:

NDVI

The Normalized Difference Vegetation Index (NDVI) is a metric designed to detect regions with vegetation by measuring the difference between near-infrared (NIR) light (which vegetation reflects) and red light (which vegetation absorbs).

The NDVI ranges over [-1,+1], where -1 means more "Red" radiation while +1 means more "NIR" radiation. NDVI values close to +1.0 suggest areas dense with active green foliage, while strongly negative values suggest cloud cover or snow, and values near zero suggest open water, urban areas, or bare soil.

For our synthetic example here, we don't have access to NIR measurements, but we can approximate the results for demonstration purposes by using the green and blue channels of a colormapped image, as those represent a difference in wavelength similar to NIR vs. Red.

In [10]:
import xarray as xr

rgba = stack(shade(terrain, cmap=Elevation, how='linear')).to_pil()
r,g,b,a = [xr.DataArray(np.flipud(np.asarray(rgba.getchannel(c))))/255.0 
           for c in ['R','G','B','A']]

ndvi = dsgeo.ndvi(nir_agg=g, red_agg=b)
shade(ndvi, cmap=['purple','black','green'], how='linear')
Out[10]:

Bump

Bump mapping is a cartographic technique that can be used to create the appearance of trees or other land features, which is useful when synthesizing human-interpretable images from source data like land use classifications.

dsgeo.bump will produce a bump aggregate for adding detail to the terrain.

In this example, we will pretend the bumps are trees, and shade them with green. We'll also use the elevation data to modulate whether there are trees and if so how tall they are.

  • First, we'll define a custom height function to return tree heights suitable for the given elevation range
  • dsgeo.bump accepts a function with only a single argument (locations), so we will use functools.partial to provide values for the other arguments.
  • Bump mapping isn't normally a performance bottleneck, but if you want, you can speed it up by using Numba on your custom height function (from datashader.utils import ngjit, then put @ngjit above def heights(...)).
In [11]:
from functools import partial

def heights(locations, src, src_range, height=20):
    num_bumps = locations.shape[0]
    out = np.zeros(num_bumps, dtype=np.uint16)
    for r in range(0, num_bumps):
        loc = locations[r]
        x = loc[0]
        y = loc[1]
        val = src[y, x]
        if val >= src_range[0] and val < src_range[1]:
            out[r] = height
    return out

T = 300000 # Number of trees to add per call
src = terrain.data
%time trees  = dsgeo.bump(W, H, count=T,    height_func=partial(heights, src=src, src_range=(1000, 1300), height=5))
trees       += dsgeo.bump(W, H, count=T//2, height_func=partial(heights, src=src, src_range=(1300, 1700), height=20))
trees       += dsgeo.bump(W, H, count=T//3, height_func=partial(heights, src=src, src_range=(1700, 2000), height=5))

tree_colorize = trees.copy()
tree_colorize.data[tree_colorize.data == 0] = np.nan
hillshade = dsgeo.hillshade(terrain + trees)

stack(shade(terrain,        cmap=['black', 'white'], how='linear'),
      shade(hillshade,      cmap=['black', 'white'], how='linear', alpha=128),
      shade(tree_colorize,  cmap='limegreen',        how='linear'))
CPU times: user 617 ms, sys: 2 µs, total: 617 ms
Wall time: 616 ms
Out[11]:

Mean

The datashader.mean function will smooth a given aggregate by using a 3x3 mean convolution filter. Optional parameters include passes, which is used to run the mean filter multiple times, and also excludes which are values that will not be modified by the mean filter.

We can use mean to add a coastal vignette to give out terrain scene a bit more character. Notice the water below now has a nice coastal gradient which adds some realism to our scene.

In [12]:
LAND_CONSTANT = 50.0

water = terrain.copy()
water.data = np.where(water.data > 0, LAND_CONSTANT, 0)
water = dsgeo.mean(water, passes=50, excludes=[LAND_CONSTANT])
water.data[water.data == LAND_CONSTANT] = np.nan

stack(shade(terrain,    cmap=['black', 'white'], how='linear'),
      shade(water,      cmap=['aqua',  'white']))
Out[12]:

Full scene

We've now seen several of datashader's geo helper functions for working with elevation rasters.

Let's make a full archipelago scene by stacking terrain, water, hillshade, and tree_colorize together into one output image:

In [13]:
stack(shade(terrain,       cmap=Elevation,          how='linear'),
      shade(water,         cmap=['aqua','white']),
      shade(dsgeo.hillshade(terrain + trees),   cmap=['black', 'white'], how='linear', alpha=128),
      shade(tree_colorize, cmap='limegreen',        how='linear'))
Out[13]:

Proximity

The datashader.spatial.proximity function operates on a given aggregate to produce a new distance aggregate based on target values and a distance metric. The values in the new aggregate will be the distance (according to the given metric) between each array cell (pixel) and the nearest target value in the source aggregate.

A powerful feature of proximity is that you can target specific values in the aggregate for distance calculation, while others are ignored. Play with the target_values parameter below and see the difference of using target_values=[1,2,3] vs. target_values=[2] vs. target_values=[3]

Load data and create ds.Canvas
In [14]:
from datashader.spatial import proximity
from datashader.transfer_functions import dynspread
from datashader.transfer_functions import set_background

df = pd.DataFrame({
   'x': [-13, -11, -5,4, 9, 11, 18, 6],
   'y': [-13, -5, 0, 10, 7, 2, 5, -5]
})

cvs = ds.Canvas(plot_width=W, plot_height=H,
                x_range=(-20, 20), y_range=(-20, 20))
Create Proximity Aggregate
  • Use Canvas.points to create an xarray.DataArray
  • Calculate proximity to nearest non-nan / non-zero elements using datashader.spatial.proximity
In [15]:
points_agg = cvs.points(df, x='x', y='y')
points_shaded = dynspread(shade(points_agg, cmap=['salmon',  'salmon']),
                          threshold=1,
                          max_px=5)
set_background(points_shaded, 'black')
Out[15]:
Create proximity grid for all non-zero values
In [16]:
proximity_agg = proximity(points_agg)

stack(shade(proximity_agg, cmap=['darkturquoise', 'black'], how='linear'),
      points_shaded)
Out[16]:
In [17]:
line_agg = cvs.line(df, x='x', y='y')
line_shaded = dynspread(shade(line_agg, cmap=['salmon',  'salmon']),
                          threshold=1,
                          max_px=2)
set_background(line_shaded, 'black')
Out[17]:
In [18]:
line_proximity = proximity(line_agg)
stack(shade(line_proximity, cmap=['darkturquoise', 'black'], how='linear'),
      line_shaded)
Out[18]:
Transform Proximity DataArray

Like the other Datashader spatial tools, the result of proximity is an xarray.DataArray with a large API of potential transformations.

Below is an example of using DataArray.where() to apply a minimum distance and maximum distance.

In [19]:
where_clause = (line_proximity > 1) & (line_proximity < 1.1)
proximity_shaded = shade(line_proximity.where(where_clause), cmap=['darkturquoise', 'darkturquoise'])
proximity_shaded = set_background(proximity_shaded, 'black')
stack(proximity_shaded, line_shaded)
Out[19]:

Viewshed

The datashader.spatial.viewshed function operates on a given aggregate to calculate the viewshed (the visible cells in the raster) for the given viewpoint (observer) location.

The visibility model is the following: Two cells are visible to each other if the line of sight that connects their centers does not intersect the terrain. If the line of sight does not pass through the cell center, elevation is determined using bilinear interpolation.

Simple Viewshed Example
  • The example below creates a datashader aggregate from a 2d normal distribution.
  • To calculate the viewshed, we need an observer location.
  • This location is indicated by the orange point in the upper-left of the plot.
In [20]:
from datashader.spatial import proximity
from datashader.spatial import viewshed

from datashader.transfer_functions import dynspread
from datashader.transfer_functions import set_background

OBSERVER_X = -12.5
OBSERVER_Y = 10

H = 400
W = 400

canvas = ds.Canvas(plot_width=W, plot_height=H,
                   x_range=(-20, 20), y_range=(-20, 20))

normal_df = pd.DataFrame({
   'x': np.random.normal(.5, 1, 10000000),
   'y': np.random.normal(.5, 1, 10000000)
})
normal_agg = canvas.points(normal_df, 'x', 'y')
normal_agg.values = normal_agg.values.astype("float64")
normal_shaded = shade(normal_agg)

observer_df = pd.DataFrame({'x': [OBSERVER_X], 'y': [OBSERVER_Y]})
observer_agg = canvas.points(observer_df, 'x', 'y')
observer_shaded = dynspread(shade(observer_agg, cmap=['orange']),
                            threshold=1, max_px=4)

normal_illuminated = dsgeo.hillshade(normal_agg)
normal_illuminated_shaded = shade(normal_illuminated, cmap=['black', 'white'], 
                                  alpha=128, how='linear')

stack(normal_illuminated_shaded, observer_shaded)
Out[20]:
Calculate viewshed using the observer location
In [21]:
# Will take some time to run...
%time view = viewshed(normal_agg, x=OBSERVER_X, y=OBSERVER_Y)

view_shaded = shade(view, cmap=['white', 'red'], alpha=128, how='linear')

stack(normal_illuminated_shaded, observer_shaded, view_shaded)                         
CPU times: user 536 ms, sys: 24.1 ms, total: 560 ms
Wall time: 558 ms
Out[21]:
Viewshed on Terrain
  • Let's take the example above and apply it to our terrain aggregate.
  • Notice the use of the observer_elev argument, which is the height of the observer above the terrain.
In [22]:
from datashader.spatial import viewshed

W = 600
H = 400

cvs = ds.Canvas(plot_width=W, plot_height=H, x_range=(-20e6, 20e6), y_range=(-20e6, 20e6))
terrain = dsgeo.generate_terrain(cvs)
terrain_shaded = shade(terrain, cmap=Elevation, alpha=128, how='linear')

illuminated = dsgeo.hillshade(terrain)

OBSERVER_X = 0.0
OBSERVER_Y = 0.0

observer_df = pd.DataFrame({'x': [OBSERVER_X],'y': [OBSERVER_Y]})
observer_agg = cvs.points(observer_df, 'x', 'y')
observer_shaded = dynspread(shade(observer_agg, cmap=['orange']),
                            threshold=1, max_px=4)

stack(shade(illuminated, cmap=['black', 'white'], alpha=128, how='linear'),
      terrain_shaded,
      observer_shaded)
Out[22]:
In [23]:
%time view = viewshed(terrain, x=OBSERVER_X, y=OBSERVER_Y, observer_elev=100)

view_shaded = shade(view, cmap='fuchsia', how='linear')
stack(shade(illuminated, cmap=['black', 'white'], alpha=128, how='linear'),
      terrain_shaded,
      view_shaded,
      observer_shaded)
CPU times: user 384 ms, sys: 44 ms, total: 428 ms
Wall time: 426 ms
Out[23]:

The fuchsia areas are those visible to an observer of the given height at the indicated orange location.

Zonal Statistics

Zonal statistics allows for calculating summary statistics for specific areas or zones within a datashader aggregate. Zones are defined by creating an integer aggregate where the cell values are zone_ids. The output of zonal statistics is a Pandas dataframe containing summary statistics for each zone based on a value raster.

Imagine the following scenario:

  • You are a hiker on a six-day-trip.
  • The path for each day is defined by a line segement.
  • You wish to calculate the max and min elevations for each hiking segment as a Pandas dataframe based on an elevation dataset.
In [24]:
from datashader.colors import Set1

W = 800
H = 600

cvs = ds.Canvas(plot_width=W, plot_height=H, x_range=(-20, 20), y_range=(-20, 20))

terrain = dsgeo.generate_terrain(cvs)
terrain_shaded = shade(terrain, cmap=Elevation, alpha=128, how='linear')

illuminated = dsgeo.hillshade(terrain)
illuminated_shaded = shade(illuminated, cmap=['gray', 'white'], alpha=255, how='linear')

zone_df = pd.DataFrame({
   'x': [-11, -5, 4, 12, 14, 18, 19],
   'y': [-5, 4, 10, 13, 13, 13, 10],
   'trail_segement_id': [11, 12, 13, 14, 15, 16, 17]
})

zones_agg = cvs.line(zone_df, 'x', 'y', ds.sum('trail_segement_id'))
zones_agg.values = np.nan_to_num(zones_agg.values, copy=False).astype(np.int)
zones_shaded = dynspread(shade(zones_agg, cmap=Set1), max_px=5)

stack(illuminated_shaded, terrain_shaded, zones_shaded)
Out[24]:
In [25]:
from datashader.spatial import zonal_stats

zonal_stats(zones_agg, terrain)
Out[25]:
mean max min std var
11 1321.919879 1428.384240 1228.065037 53.409131 2852.535316
12 1489.600166 2399.556289 1220.694206 310.177594 96210.140061
13 1711.632114 2430.430303 1368.922124 278.369664 77489.669636
14 1355.661087 1368.952389 1346.856316 6.249734 39.059170
15 1381.950485 1443.856734 1319.891060 38.918956 1514.685125
16 1486.314866 1709.222348 1367.673562 120.950684 14629.068060
Calculate custom stats for each zone
In [26]:
stats = dict(elevation_change=lambda zone: zone.max() - zone.min(),
             elevation_min=np.min,
             elevation_max=np.max)

zonal_stats(zones_agg, terrain, stats)
Out[26]:
elevation_change elevation_min elevation_max
11 200.319203 1228.065037 1428.384240
12 1178.862083 1220.694206 2399.556289
13 1061.508179 1368.922124 2430.430303
14 22.096073 1346.856316 1368.952389
15 123.965674 1319.891060 1443.856734
16 341.548786 1367.673562 1709.222348

Here the zones are defined by line segments, but they can be any spatial pattern, and in particular can be any region computable as a Datashader aggregate.

References


Right click to download this notebook from GitHub.